CLAIMS 



What is claimed is: 

1. A method of performing a discrete cosine transform (DCT) using a microprocessor 
having an instruction set that includes SIMD floating point instructions, wherein the 
method comprises: 

receiving a block of integer data having C columns and R rows, wherein the block 
of integer data is indicative of a portion of an image; and 

for each row, 

loading the row data into registers; 

converting the row data into floating point form, wherein the registers each 

hold two floating point row data values; and 
performing a plurality of weighted-rotation operations on the values in the 

registers, wherein the weighted-rotation operations are performed 

using SIMD floating point instructions. 

2. The method of claim 1, wherein said converting is accomplished using the pi2fw 
instruction. 

3. The method of claim 1, wherein said weighted-rotation operations are accomplished 
using the pswap, pfmul, and pQ)nacc instructions. 

4. The method of claim 1, further comprising: 

for each row, 
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altering the arrangement of values in the registers; 

performing a second plurality of weighted-rotation operations on the 

values in the registers; 
again altering the arrangement of the values in the registers; 
performing a third plurality of weighted-rotation operations on the values 

in the registers; 

yet again altering the arrangement of the values in the registers; and 
performing a fourth plurality of weighted-rotation operations on the values 
in the registers to obtain intermediate floating point values. 

5. The method of claim 4, further comprising: 

for each row, 

storing the intermediate floating point values to an intermediate buffer. 

6. The method of claim 5, further comprising: 

for two columns at a time, 

loading data from two columns of intermediate data into each of a 

plurality of registers; 
performing a plurality of weighted-rotation operations on the values in the 

registers, wherein the weighted-rotation operations for two 

columns are performed in parallel using SIMD floating point 

instructions. 
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7. The method of claim 6, wherein said weighted-rotation operations for two columns at a 
time are accompMshed using pfmul, pfsub, andpfadd instructions. 

8. The method of claim 6, further comprising: 

for two columns at a time, 

as each weighted-rotation operation is done, storing weighted-rotation 
operation results to the intermediate buffer. 

9. The method of claim 8, further comprising: 

for two columns at a time, 

retrieving weighted-rotation operation results from the intermediate buffer; 
performing a second plurality of weighted-rotation operations on the 
retrieved values; 

again storing weighted-rotation operation results to the intermediate buffer 
as the weighted-rotation operations of the second plurality are 
done; 

again retrieving weighted-rotation operation results from the intermediate 
buffer; 

performing a third plurality of weighted-rotation operations on the 
retrieved values; 

yet again storing weighted-rotation operation results to the intermediate 
buffer as the weighted-rotation operations of the third plurality are 
done; 
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yet again retrieving weighted-rotation operation results from the 

intermediate buffer; 
performing a fourth pluraUty of weighted-rotation operations on the 

retrieved values; 

converting the weighted-rotation operation results from the fourth plurality 
to integer results. 

10, The method of claim 9, fiirther comprising: 

for two columns at a time, writing the integer results to an output buffer. 

11. A method of performing a discrete cosine transform (DCT) using a microprocessor 
having an instruction set that includes SIMD floating point instructions, wherein the 
method comprises: 

receiving a block of integer data having C columns and R rows; and 
for two columns at a time, 

loading column data into registers; 

converting the column data into floating point form, wherein the registers 
each hold a floating point column data value from two columns; 
and 

performing a plurality of weighted-rotation operations on the values in the 
registers, wherein the weighted-rotation operations for two 
columns are performed in parallel using SIMD floating point 
instructions. 
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12. The method of claim 11, wherein said weighted-rotation operations are accomphshed 
using pfmul, pfsub, and pfadd instructions. 

13. The method of claim 11, fiirther comprising: 

for two columns at a time, 

as each weighted-rotation operation is done, storing weighted-rotation 
operation results to an intermediate buffer. 

14. The method of claim 13, further comprising: 

for two columns at a time, 

retrieving weighted-rotation operation results from the intermediate buffer; 
performing a second plurahty of weighted-rotation operations on the 
retrieved values; 

again storing weighted-rotation operation results to the intermediate buffer 
as the weighted-rotation operations of the second plurahty are 
done; 

again retrieving weighted-rotation operation results from the intermediate 
buffer; 

performing a third plurahty of weighted-rotation operations on the 
retrieved values; 

yet again storing weighted-rotation operation results to the intermediate 
buffer as the weighted-rotation operations of the third plurahty are 
done; 
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yet again retrieving weighted-rotation operation results from the 

intermediate buffer; 
performing a fourth pluraUty of weighted-rotation operations on the 

retrieved values; 

converting the weighted-rotation operation results from the fourth plurality 
to integer results. 

15. The method of claim 14, further comprising: 

for two columns at a time, writing the integer results to an output buffer, 

16. A computer system comprising: 

a processor having an instruction set that includes SIMD floating point 
instructions; and 

a memory coupled to the processor, wherein the memory stores software 
instructions executable by the processor to implement the method of 
receiving a block of integer data having C columns and R rows, wherein 
the block of integer data is indicative of a portion of an image; and 

for each row, 

loading the row data into registers; 

converting the row data into floating point form, wherein the registers each 
hold two floating point row data values; and 
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performing a plurality of weighted-rotation operations on the values in the 
registers, wherein the weighted-rotation operations are performed 
using SIMD floating point instructions. 



17. A carrier medium comprising software instructions executable by a microprocessor 
having an instruction set that includes SIMD floating point instructions to implement a 
method of performing discrete cosine transform (DCT), wherein the method comprises: 

receiving a block of integer data having C columns and R rows, wherein the block 

of integer data is indicative of a portion of an image; and 
for each row, 

loading the row data into registers; 

converting the row data into floating point form, wherein the registers each 

hold two floating point row data values; and 
performing a plurality of weighted-rotation operations on the values in the 

registers, wherein the weighted-rotation operations are performed 

using SIMD floating point instructions. 
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